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Clustering is a widely used technique in data mining applications to discover patterns in 
the underlying data. Most traditional clustering algorithms are limited to handling datasets 
that contain either continuous or categorical attributes. However, datasets with mixed 
types of attributes are common in real life data mining problems. In this paper, we 
propose a distance measure that enables clustering data with both continuous and 
categorical attributes. This distance measure is derived from a p ... 
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iDistance: An adaptive BMree based indexing method for nearest neighbor search Q 
H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang 
June 2005 ACM Transactions on Database Systems (TODS), Volume 30 issue 2 
Publisher: ACM Press 

Full text available: ^pdf(1.16MB) Additional Information: full citation , abstract , references , index terms 

In this article, we present an efficient B^'^-tree based indexing method, called iDistance, 
for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance 
partitions the data based on a space- or data-partitioning strategy, and selects a reference 
point for each partition. The data points in each partition are transformed into a single 
dimensional value based on their similarity with respect to the reference point. This allows 
the points to be indexed using a B 
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5 A novel feature selection method to improve classification of gene expression data Q 
Liang Goh, Qun Song, Nikola Kasabov 

January 2004 Proceedings of the second conference on Asia-Pacific bioinformatics - 
Volume 29 CRPIT '04 

Publisher: Australian Computer Society, Inc. 

Full text available: ^ pdf(202.49 KB) Additional Information: full citation , abstract, references 

This paper introduces a novel method for minimum number of gene (feature) selection for 
a classification problem based on gene expression data with an objective function to 
maximise the classification accuracy. The method uses a hybrid of Pearson correlation 
coefficient (PCC) and signal-to-noise ratio (SNR) methods combined with an evolving 
classification function (ECF). First, the correlation coefficients between genes in a set of 
thousands, is calculated. Genes, that are highly correlated aero ... 
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